NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Federation Strikes Back: A Survey of Federated Learning Privacy Attacks, Defenses, Applications, and Policy Landscape

https://doi.org/10.1145/3724113

Zhao, Joshua; Bagchi, Saurabh; Avestimehr, Salman; Chan, Kevin; Chaterji, Somali; Dimitriadis, Dimitris; Li, Jiacheng; Li, Ninghui; Nourian, Arash; Roth, Holger (September 2025, ACM Computing Surveys)

Deep learning has shown incredible potential across a wide array of tasks, and accompanied by this growth has been an insatiable appetite for data. However, a large amount of data needed for enabling deep learning is stored on personal devices, and recent concerns on privacy have further highlighted challenges for accessing such data. As a result, federated learning (FL) has emerged as an important privacy-preserving technology that enables collaborative training of machine learning models without the need to send the raw, potentially sensitive, data to a central server. However, the fundamental premise that sending model updates to a server is privacy-preserving only holds if the updates cannot be “reverse engineered” to infer information about the private training data. It has been shown under a wide variety of settings that this privacy premise doesnothold. In this article we provide a comprehensive literature review of the different privacy attacks and defense methods in FL. We identify the current limitations of these attacks and highlight the settings in which the privacy of an FL client can be broken. We further dissect some of the successful industry applications of FL and draw lessons for future successful adoption. We survey the emerging landscape of privacy regulation for FL and conclude with future directions for taking FL toward the cherished goal of generating accurate models while preserving the privacy of the data from its participants.
more » « less
Free, publicly-accessible full text available September 30, 2026
CroMo-Mixup: Augmenting Cross-Model Representations for Continual Self-Supervised Learning

https://doi.org/10.1007/978-3-031-72989-8_18

Mushtaq, Erum; Yaldiz, Duygu Nur; Bakman, Yavuz Faruk; Ding, Jie; Tao, Chenyang; Dimitriadis, Dimitrios; Avestimehr, Salman (October 2024, Springer Nature Switzerland)

Full Text Available
Loki: Large-scale Data Reconstruction Attack against Federated Learning through Model Manipulation

https://doi.org/10.1109/SP54263.2024.00030

Zhao, Joshua C; Sharma, Atul; Elkordy, Ahmed Roushdy; Ezzeldin, Yahya H; Avestimehr, Salman; Bagchi, Saurabh (May 2024, IEEE Security and Privacy)

Full Text Available
Don't Memorize; Mimic The Past: Federated Class Incremental Learning Without Episodic Memory

Babakniya, Sara; Fabian, Zalan; He, Chaoyang; Soltanolkotabi, Mahdi; Avestimehr, Salman (December 2023, Proceedings of Neural Information Processing Systems)

Deep learning models are prone to forgetting information learned in the past when trained on new data. This problem becomes even more pronounced in the context of federated learning (FL), where data is decentralized and subject to independent changes for each user. Continual Learning (CL) studies this so-called \textit{catastrophic forgetting} phenomenon primarily in centralized settings, where the learner has direct access to the complete training dataset. However, applying CL techniques to FL is not straightforward due to privacy concerns and resource limitations. This paper presents a framework for federated class incremental learning that utilizes a generative model to synthesize samples from past distributions instead of storing part of past data. Then, clients can leverage the generative model to mitigate catastrophic forgetting locally. The generative model is trained on the server using data-free methods at the end of each task without requesting data from clients. Therefore, it reduces the risk of data leakage as opposed to training it on the client's private data. We demonstrate significant improvements for the CIFAR-100 dataset compared to existing baselines.
more » « less
Full Text Available
Distributed Architecture Search Over Heterogeneous Distributions

Mushtaq, Erum; He, Chaoyang; Ding, Jie; Avestimehr, Salman (November 2023, Transactions on machine learning research)

Federated learning (FL) is an efficient learning framework that assists distributed machine learning when data cannot be shared with a centralized server. Recent advancements in FL use predefined architecture-based learning for all clients. However, given that clients’ data are invisible to the server and data distributions are non-identical across clients, a predefined architecture discovered in a centralized setting may not be an optimal solution for all the clients in FL. Motivated by this challenge, we introduce SPIDER, an algorithmic frame- work that aims to Search PersonalIzed neural architecture for feDERated learning. SPIDER is designed based on two unique features: (1) alternately optimizing one architecture- homogeneous global model in a generic FL manner and architecture-heterogeneous local models that are connected to the global model by weight-sharing-based regularization, (2) achieving architecture-heterogeneous local models by a perturbation-based neural architecture search method. Experimental results demonstrate superior prediction performance compared with other state-of-the-art personalization methods. Code is available at https://github.com/ErumMushtaq/SPIDER.git.
more » « less
Full Text Available
mL-BFGS: A Momentum-based L-BFGS for Distributed Large-Scale Neural Network Optimization

Niu, Yue; Fabian, Zalan; Lee, Sunwoo; Soltanolkotabi, Mahdi; Avestimehr, Salman (July 2023, Transactions on Machine Learning Research)

Quasi-Newton methods still face significant challenges in training large-scale neural networks due to additional compute costs in the Hessian related computations and instability issues in stochastic training. A well-known method, L-BFGS that efficiently approximates the Hessian using history parameter and gradient changes, suffers convergence instability in stochastic training. So far, attempts that adapt L-BFGS to large-scale stochastic training incur considerable extra overhead, which offsets its convergence benefits in wall-clock time. In this paper, we propose mL-BFGS, a lightweight momentum-based L-BFGS algorithm that paves the way for quasi-Newton (QN) methods in large-scale distributed deep neural network (DNN) optimization. mL-BFGS introduces a nearly cost-free momentum scheme into L-BFGS update and greatly reduces stochastic noise in the Hessian, therefore stabilizing convergence during stochastic optimization. For model training at a large scale, mL-BFGS approximates a block-wise Hessian, thus enabling distributing compute and memory costs across all computing nodes. We provide a supporting convergence analysis for mL-BFGS in stochastic settings. To investigate mL-BFGS’s potential in large-scale DNN training, we train benchmark neural models using mL-BFGS and compare performance with baselines (SGD, Adam, and other quasi-Newton methods). Results show that mL-BFGS achieves both noticeable iteration-wise and wall-clock speedup.
more » « less
Full Text Available
On The Effectiveness of Active Learning by Uncertainty Sampling in Classification of High-Dimensional Gaussian Mixture Data

https://doi.org/10.1109/ICASSP43922.2022.9747685

Mai, Xiaoyi; Avestimehr, Salman; Ortega, Antonio; Soltanolkotabi, Mahdi (May 2022, IEEE)

Active learning aims to reduce the cost of labeling through selective sampling. Despite reported empirical success over passive learning, many popular active learning heuristics such as uncertainty sampling still lack satisfying theoretical guarantees. Towards closing the gap between practical use and theoretical understanding in active learning, we propose to characterize the exact behavior of uncertainty sampling for high-dimensional Gaussian mixture data, in a modern regime of big data where the numbers of samples and features are commensurately large. Through a sharp characterization of the learning results, our analysis sheds light on the important question of when uncertainty sampling works better than passive learning. Our results show that the effectiveness of uncertainty sampling is not always ensured. In fact it depends crucially on the choice of i) an adequate initial classifier used to start the active sampling process and ii) a proper loss function that allows an adaptive treatment of samples queried at various steps.
more » « less
Full Text Available
Adaptive Verifiable Coded Computing: Towards Fast, Secure and Private Distributed Machine Learning

https://doi.org/10.1109/IPDPS53621.2022.00067

Tang, Tingting; Ali, Ramy E.; Hashemi, Hanieh; Gangwani, Tynan; Avestimehr, Salman; Annavaram, Murali (June 2022, 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS))

Stragglers, Byzantine workers, and data privacy are the main bottlenecks in distributed cloud computing. Some prior works proposed coded computing strategies to jointly address all three challenges. They require either a large number of workers, a significant communication cost or a significant computational complexity to tolerate Byzantine workers. Much of the overhead in prior schemes comes from the fact that they tightly couple coding for all three problems into a single framework. In this paper, we propose Adaptive Verifiable Coded Computing (AVCC) framework that decouples the Byzantine node detection challenge from the straggler tolerance. AVCC leverages coded computing just for handling stragglers and privacy, and then uses an orthogonal approach that leverages verifiable computing to mitigate Byzantine workers. Furthermore, AVCC dynamically adapts its coding scheme to trade-off straggler tolerance with Byzantine protection. We evaluate AVCC on a compute-intensive distributed logistic regression application. Our experiments show that AVCC achieves up to 4.2× speedup and up to 5.1% accuracy improvement over the state-of-the-art Lagrange coded computing approach (LCC). AVCC also speeds up the conventional uncoded implementation of distributed logistic regression by up to 7.6×, and improves the test accuracy by up to 12.1%.
more » « less
Full Text Available
Private Retrieval, Computing, and Learning: Recent Progress and Future Challenges

https://doi.org/10.1109/JSAC.2022.3142358

Ulukus, Sennur; Avestimehr, Salman; Gastpar, Michael; Jafar, Syed A.; Tandon, Ravi; Tian, Chao (March 2022, IEEE Journal on Selected Areas in Communications)

Full Text Available
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models

He, Chaoyang; Li, Shen; Soltanolkotabi, Mahdi; Avestimehr, Salman (July 2021, International Conference on Machine Learning)
null (Ed.)
The size of Transformer models is growing at an unprecedented rate. It has taken less than one year to reach trillion-level parameters since the release of GPT-3 (175B). Training such models requires both substantial engineering efforts and enormous computing resources, which are luxuries most research teams cannot afford. In this paper, we propose PipeTransformer, which leverages automated elastic pipelining for efficient distributed training of Transformer models. In PipeTransformer, we design an adaptive on the fly freeze algorithm that can identify and freeze some layers gradually during training, and an elastic pipelining system that can dynamically allocate resources to train the remaining active layers. More specifically, PipeTransformer automatically excludes frozen layers from the pipeline, packs active layers into fewer GPUs, and forks more replicas to increase data-parallel width. We evaluate PipeTransformer using Vision Transformer (ViT) on ImageNet and BERT on SQuAD and GLUE datasets. Our results show that compared to the state-of-the-art baseline, PipeTransformer attains up to 2:83- fold speedup without losing accuracy. We also provide various performance analyses for a more comprehensive understanding of our algorithmic and system-wise design. Finally, we have modularized our training system with flexible APIs and made the source code publicly available at https://DistML.ai.
more » « less
Full Text Available

« Prev Next »

Search for: All records